6 research outputs found
Phonological Features for 0-shot Multilingual Speech Synthesis
Code-switching---the intra-utterance use of multiple languages---is prevalent
across the world. Within text-to-speech (TTS), multilingual models have been
found to enable code-switching. By modifying the linguistic input to
sequence-to-sequence TTS, we show that code-switching is possible for languages
unseen during training, even within monolingual models. We use a small set of
phonological features derived from the International Phonetic Alphabet (IPA),
such as vowel height and frontness, consonant place and manner. This allows the
model topology to stay unchanged for different languages, and enables new,
previously unseen feature combinations to be interpreted by the model. We show
that this allows us to generate intelligible, code-switched speech in a new
language at test time, including the approximation of sounds never seen in
training.Comment: 5 pages, to be presented at INTERSPEECH 202
ADEPT:A dataset for evaluating prosody transfer
Text-to-speech is now able to achieve near-human naturalness and research
focus has shifted to increasing expressivity. One popular method is to transfer
the prosody from a reference speech sample. There have been considerable
advances in using prosody transfer to generate more expressive speech, but the
field lacks a clear definition of what successful prosody transfer means and a
method for measuring it.
We introduce a dataset of prosodically-varied reference natural speech
samples for evaluating prosody transfer. The samples include global variations
reflecting emotion and interpersonal attitude, and local variations reflecting
topical emphasis, propositional attitude, syntactic phrasing and marked
tonicity. The corpus only includes prosodic variations that listeners are able
to distinguish with reasonable accuracy, and we report these figures as a
benchmark against which text-to-speech prosody transfer can be compared.
We conclude the paper with a demonstration of our proposed evaluation
methodology, using the corpus to evaluate two text-to-speech models that
perform prosody transfer.Comment: 5 pages, 1 figure, accepted to Interspeech 202
Ctrl-P:Temporal control of prosodic variation for speech synthesis
Text does not fully specify the spoken form, so text-to-speech models must be
able to learn from speech data that vary in ways not explained by the
corresponding text. One way to reduce the amount of unexplained variation in
training data is to provide acoustic information as an additional learning
signal. When generating speech, modifying this acoustic information enables
multiple distinct renditions of a text to be produced.
Since much of the unexplained variation is in the prosody, we propose a model
that generates speech explicitly conditioned on the three primary acoustic
correlates of prosody: , energy and duration. The model is flexible
about how the values of these features are specified: they can be externally
provided, or predicted from text, or predicted then subsequently modified.
Compared to a model that employs a variational auto-encoder to learn
unsupervised latent features, our model provides more interpretable,
temporally-precise, and disentangled control. When automatically predicting the
acoustic features from text, it generates speech that is more natural than that
from a Tacotron 2 model with reference encoder. Subsequent human-in-the-loop
modification of the predicted acoustic features can significantly further
increase naturalness.Comment: To be published in Interspeech 2021. 5 pages, 4 figure
The concise guide to PHARMACOLOGY 2013/14:G protein-coupled receptors
The Concise Guide to PHARMACOLOGY 2013/14 provides concise overviews of the key properties of over 2000 human drug targets with their pharmacology, plus links to an open access knowledgebase of drug targets and their ligands (www.guidetopharmacology.org), which provides more detailed views of target and ligand properties. The full contents can be found at http://onlinelibrary.wiley.com/doi/10.1111/bph.12444/full. G protein-coupled receptors are one of the seven major pharmacological targets into which the Guide is divided, with the others being G protein-coupled receptors, ligand-gated ion channels, ion channels, catalytic receptors, nuclear hormone receptors, transporters and enzymes. These are presented with nomenclature guidance and summary information on the best available pharmacological tools, alongside key references and suggestions for further reading. A new landscape format has easy to use tables comparing related targets. It is a condensed version of material contemporary to late 2013, which is presented in greater detail and constantly updated on the website www.guidetopharmacology.org, superseding data presented in previous Guides to Receptors and Channels. It is produced in conjunction with NC-IUPHAR and provides the official IUPHAR classification and nomenclature for human drug targets, where appropriate. It consolidates information previously curated and displayed separately in IUPHAR-DB and the Guide to Receptors and Channels, providing a permanent, citable, point-in-time record that will survive database updates